Document Categorization using Multilingual Associative Networks based on Wikipedia
نویسندگان
چکیده
Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual associative networks in tasks related to document categorization by comparing the results of both types of associative network on a multilingual dataset.
منابع مشابه
Hierarchical Document Categorization Using Associative Networks
Associative networks are a connectionist language model with the ability to handle dynamic data. We used two associative networks to categorize random sets of related Wikipedia articles with only their raw text as input. We then compared the resulting categorization to a gold standard: the manual categorization by Wikipedia authors and used a neural network as a baseline. We also determined a h...
متن کاملUsing Wikipedia with associative networks for document classification
We demonstrate a new technique for building associative networks based on Wikipedia, comparing them to WordNet-based associative networks that we used previously, finding the Wikipedia-based networks to perform better at document classification. Additionally, we compare the performance of associative networks to various other text classification techniques using the Reuters-21578 dataset, estab...
متن کاملGrouping by association: using associative networks for document categorization
In this thesis we describe a method of using associative networks for automatic document grouping. Associative networks are networks of ideas or concepts in which each concept is linked to concepts that are semantically similar to it. By activating concepts in the network based on the text of a document and spreading this activation to related concepts, we can determine which concepts are relat...
متن کاملUsing Natural Language Processing to Improve Document Categorization with Associative Networks
Associative networks are a connectionist language model with the ability to handle large sets of documents. In this research we investigated the use of natural language processing techniques (part-of-speech tagging and parsing) in combination with Associative Networks for document categorization and compare the results to a TF-IDF baseline. By filtering out unwanted observations and preselectin...
متن کاملKnowledge Transfer across Multilingual Corpora via Latent Topics
This paper explores bridging the content of two different languages via latent topics. Specifically, we propose a unified probabilistic model to simultaneously model latent topics from bilingual corpora that discuss comparable content and use the topics as features in a cross-lingual, dictionary-less text categorization task. Experimental results on multilingual Wikipedia data show that the pro...
متن کامل